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ABSTRACT 

The technical feasibility of a continental 
information network for astronomy has been demonstrated in the course 
of a two- month experiment conducted jointly by Dearborn Observatory 
of Northwestern University and the Stanford University Computation 
Center. The experiment simulated a scientific information network 
based on a high-level retrieval language of the non- procedura 1 type 
named DIRAC. A data-base of astronomical catalogues was maintained in 
Palo Alto, California, and was queried remotely by a team of 
astronomers in Illinois. The relevant parameters of approximately one 
hundred time-sharing sessions were thus recorded. Analysis of the 
experiments in terms of operating system efficiency, user interface 
and cost effectiveness supports the idea that the network concept is 
basic to meaningful scientific documentation systems; it also 
indicates that generalized software is the key to cost-effective 
information retrieval in the environment considered and, by 
extension, in a variety of scientific areas that rely on a 
combination of bibliographic and catalogued information with a high 
degree of internal structure. The article reviews the problems of 
astronomical data structures in their relevance to language design 
and to the general problem of scientific information handling, and it 
discusses the various factors: administrative, computational and 
psychological, that will affect the implementation of future 
networks. (Author) 
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This is the second report on the research in progress in the 
Information Systems group at Stanford University Computation Center. The 
first report, entitled: 

THE DIRAC LANGUAGE: CONCEPTS & FACILITIES 

was released in May 1970 and can be obtained from the author. 

The third and final report in this Series will be entitled: 



MEDICAL DATA-MANAGEMENT IN TIME-SHARING: 
Findings of the DIRAC Project 



and is scheduled for release in November 1970. 
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INTRODUCTION 



Since the advent of time-sharing in the mid-sixties, considerable 
attention has been devoted to the concept of a documentation service 
used simultaneously by many specialists. Systems of this type have been 
designed, and some have been partially implemented, to explore the 
feasibility of this concept in such environments as library automation, 
medical literature retrieval and social science. Many obstacles have 
been Identified in this field, and the results to date have been 
largely negative, with notable exceptions involving dedicated systems 
whose objectives were limited enough to be met by classical software. 

There exists a class of users whose need for interactive data 
management has been largely Ignored by the designers of these early 
systems: in the course of their research activities, professional 
scientists do not really need access to literature references as much 
as they need the capability to interact meaningfully with large 
observational records, standard tables, and private files of 
instrumental data. A research physicist, for instance, is a non-typical 
user from the point of view of current retrieval philosophies, because 
he is constantly consulting and operating on scientific catalogs (such 
as tables of standard wavelengths) but is only marginally interested 
in literature search, in view of the unreliability of keyword systems 
in a fast-changing linguistic environment, and in view of the rapid 
obsolescence of the information Itself. Large Institutions, on the 
other hand, are devoted to the preparation, publication and maintenance 
of scientific records, but they themselves do not have at their disposal 
an adequate computing tool of any generality. Clearly this Is an aspect 
of data management that lends itself to automation, and yet computing 
facilities have so far been unable to respond creatively to these needs. 
Computer languages aimed at the scientific user have been designed 
exclusively with computation In mind, FORTRAN and ALGOL being the most 
common examples. 
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The object of the Report is to present the findings of this 
project and to analyze the reactions of the user community at the national 
and i nternat ional level. The experiment was conducted in the field of 
astronomy, where it could be contrasted usefully with the concept of a 
"National Data Center" currently under consideration by specialists. 
However, these findings will be seen to have general validity: the 
transferability of the features discussed here to other scientific 
fields is inherent in the non-procedural concept that allows ur.ers to 
create, update and query files without programmer intervention. 

A special objective of the study was to identify and measure for 
the first time the parameters affecting the performance of a generalized 
retrieval system in the time-sharing environment, covering three main 
areas of activity: 

1) Catalogue preparation, data gathering, editing and updating. 

2) Interactive retrieval oP statistical and individual data about 

astronomical objects. \ 

3) Bibliography search. \ 



ACKNOWLEDGMENTS 

Much of the work described here has been performed with the help of 
astronomers at Northwestern University who have contributed their time 
and ideas during a two-month long-distance experiment sponsored by the 
Stanford Computation Center, in particular, the support of Professor 
J. Allen Hynek, Director of Dearborn Observatory, and of Mr. Lloyd 
Wackerllng, of the Astronomy Department, is very gratefully 
acknowledged. Dr.G.de Vaucouleurs, of the University of Texas, made 
available a tape containing his catalogue of Bright Galaxies. Messrs. 
Mack, Schwartz, Sargent, Shapiro, Rybski, and Dr. James Wray, have used 
the sample data-base which they helped in implementing, and they have 
offered many valuable comments refle ted in the conclusions of this 
Report. Finally, we are greatly indebted to Mr. Roderick Fredrickson, 
(now with the RAND Corporation) and to the entire Systems Staff at 
Stanford, for making this experiment possible and exciting. 



dipac 



I 

I 

I 

I 

[ 

l 

i: 

L 



AND ASTRONOMICAL DATA RETRIEVAL 



Dr. Jacques F.Vallee, Manager 
Information Systems 
Computation Center 
Stanford University 

Or. J. Allen Uynek, Chairman 
Department of Astronomy 
Northwestern University 
Evanston, 1 1 1 Inoi s 



L : 

I: 

I. 

II 

u 

li 

II 



(Presenter) at the 1970 Convention of the Association for 
Computing Machinery, Mew York, 7 Sept. 1970. This article will 
appear In the Proceedings of ACM'70, to be published early In 1971) 



0 

y 

|er|c 



6 



DIRAC AND ASTRONOMICAL DATA RETRIEVAL 



ARSTRACT 

The technical feasibility of a continental Information network 
for astronomy has been demonstrated In the course of a two-month 
exoerlment conducted Jointly by Dearborn Observatory of 
Northwestern University and the Stanford University 

Computation Center. The experiment simulated a scientific Information 
network based on a high-level retrieval language of the non-procedural 
type named OIPAC. A data-base of astronomical catalogues was maintained 
In Palo Alto, California, and was queried remotely by a team of 
astronomers In Illinois. The relevant parameters of approximately 
one hundred time-sharing sessions were thus recorded. Analysis of the 
experiments In terms of operating system efficiency, user Interface 
and cost effectiveness supports the Idea that the network concept Is 
basic to meaningful scientific documentation systems; It also Indicates 
that generalized software Is the key to cost-effective Information 
retrieval In the environment considered and, by extension. In a variety 
of scientific areas that rely on a combination of bibliographic and 
catalogued Information with a high degree of Internal structure. 

The article reviews the problems of astronomical data structures 
In their relevance to language design and to the general problem of 
scientific Information handling, and It discusses the various factors: 
administrative, computational and psychological, that will affect the 
Implementation of future networks. 
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DIRAC AND ASTRONOMICAL DATA RETRIEVAL 
Jacuues F.Vallee and J. Allen Hynek 



I. INTRODUCTION 

This article does not present general statements about information 
retrieval, automatic documentation, or the potential significance of 
i nfor, nation networks to the work patterns of scientists, although it 
touches on these three subjects. Instead, we take the point of view of 
the explorer returning from a previously unknown territory with a little 
more technical knowledge, a greatly increased puzzlement and a 
considerable amount of fresh curiosity. 

In other words, we have not come here to offer any predictions 
concerning the future impact of computers on science, but only to 
illustrate what happens when information-oriented software - in this 
case, the DIRAC language - is made available to a group of scientific 
researchers - professional astronomers - and is used by them to access 
data chat previously could only be found In books, catalogues and 
professional journals. 

This illustration will not take the form of a theoretical model or an 
hypothetical situation, but will be made by reference to an actual 
prototype implementation of an astronomical information network. The 
parameters of this network have been identified and measured during a 
six-week on-line experiment between Dearborn Observatory in Illinois and 
the Stanford Compu tat i on Center in California. Both the concept of 
scientific information networks and the analysis of the implementation 
in terms of computer systems will be discussed in this report. First, we 
should describe more closely the environment of astronomical information 
and show why an experiment in this limited area can have general 
validity, and to what extent our observations are relevant in a 
discussion of information systems of the next decade. 

II. ASTRONOMICAL DATA STRUCTURES 

The information explosion takes a peculiar for,m in Astronomy. An 
increasingly large volume of data is published every year in papers, 
articles, monographs and books. Among these data one finds new 
measurements of the parameters of stellar systems, new physical values 
affect i ng cur rent theories and the statistical evaluations upon which 
they were based. Unfortunately, no global framework is available to 
integrate this dynamically changing information; publications are 
uncoordinated and reflect personal interest rather than a concerted 
effort, to gather an homogeneous sample. Treacherous selection effects 
are present in all the catalogues, and integration of pertinent units 
from various sources is (in extreme cases) made impossible by the lack 
of a common standard. 
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All these remarks had already been made a century ago by 3 i non 
Newcomb, who wrote in h i s 'Reminiscences of an Astronomer': 

The work of all observing astronomers, so far as it 
could be used, must be combined into a single whole. 

But here again difficulties are met at every step. 

There has been, in times past, little or no concert of 
action among astronomers at different observatories. 

The astronomers of each nation, perhaps of each observatory, 
have gone to work in their own way, using discordant data, 
perhaps not always rigidly consistent, even in the data used 
in a single establishment. 

To this we must add an expanding anount of bibliographic information 
and general documentation. The sources of measurements for a given 
parameter must be clearly identified for a statistical discussion to 
have validity. For every adopted physical value, then, one must be able 
to analyze and to trace through the entire network of previous 
determinations - including their authors and the instrumental techniques 
they used. A situation is thus created which lends itself to computer 
analysis. At the same time. Information interchange io maJe difficult by 
two things: the wide geographic dispersion of teams of astronomers; and 
the complexity of the information they use. 

On figure 1 we have attempted to display graphically the structure of 
the physical information available for one stellar system, the triple 
star CASTOR. This naturally excludes all bibliographic infor nation and 
shows only currently adopted values. It will be seen that the 'atom' of 
information can be explored in six main directions, each one branching 
into secondary levels whose structure may be a function of numerical 
values at higher levels. In order to adequately represent and process 
such structures, the designer of a retrieval system must start from the 
user's own terminology without forcing it a priori into a specific 
coding scheme. The minimum requirements for a software system supporting 
a file or this type would include the ability to handle large quantities 
of numeric information in integral form and in real form, as well as the 
ability to search natural - 1 anguage text for keywords: this should he 
true at least down to the subfield level, with an unpredictable number 
of subfield values in any record. These requirements are not satisfied 
by the i nfori la t ion-hand 1 i ng capab i 1 1 t i es of procedural languages such as 
Fortran, Cobo 1 or P 1 — 1 . 

III. THE DIRAC LANGUAGE. 

The problem of complex information structures, that we have just 
mentioned, has been recognized in other fields and a number of new file 
organization techniques is now available to the computer scientist. The 
new software concept of 'non-procedural language' has emerged and is 
beginning to receive wide application in Business, Law, Government and 
military data processing (2). However, the data-base systems currently 
on 1 the market, and based on this concept, have two major drawbacks when 
they are placed in a network e'nv i ronment : 





Valfee/Hynek - Fig ire 1 

Information Structure for the Star CASTOR 
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- These systems are generally adapted from second-generation file 
processing techniques and are difficult to integrate within a remote- 
console, time-sharing situation. 

- They fail to provide a straightforward interface with text-editors 
anJ cbmpi let's in their environment. 

This places upon the user a new requirement for a degree of 
soph i s t i cat i on which is simply not available in technical personnel 
outside computing centers. In turn, this leads to longer training 
periods, decreased user acceptance and considerable costs that cannot be 
justified in the scientific research environment. 

When the idea of an astronomical network experiment was first 
discussed early in 1370, we were engaged in the testing of a language 
prototype named DIRAC (for D I Rect-ACcess and also to the five types of 
information that it handles: Date, Integer, Real, Alphanumeric, Coded). 
We wanted to determine to what degree the non-procedural language 
concept (that is primarily business-oriented) could be extended to 
support information networks, in particular in the field of science and 
of library automation, where we felt dedicated systems had generally led 
Lo disappointing performance In spite of their high cost and the 
sophistication of their users. 

How can we implement a truly generalized, yet cost-effective 
retrieval system ? The primary design objectives we proposed were a 
s tra i gh tforward user interface, and complete integration of the language 
within a unified command structure in time-sharing mode. The resulting 
system can be best illustrated by following step by step an actual 
interrogation of an astronomical catalogue. 

Figure 2 is an example of the on-line query of the Supernovae 
Catalogue implemented under DIRAC-1. The user is an astronomer who 
studies supernovae in the Virgo cluster. He first wants to know how many 
are false or suspected. The system finds one, and he displays the 
supernova number and the recession velocity. Vs. I t wi 1 1 be noted that 
DIRAC processes information in both upper and lower case, thus 
simplifying the handling of textual data; this requirement is important 
in the applications we shall consider. 

The user then wants to determine how many true supernovae in Virgo 
have a known Vs. The answer is 13. Restricting the search by use of the 
REiAIN command, he adds the rule: 

1000 kin/s <= Vs <= 2000 km/s 

The answer is 11. Among these, the astronomer wants DIRAC to locate a 
supernova for which the first article given as reference has "Mt. Wilson" 
as its source. DIRAC locates supernova number 1001b. The user is now 
able to have the velocity, galactic coordinates, and all the literature 
about the object typed out on the terminal. 
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Figure 2: On-line interrogation of an astronomical catalog 
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IV. THE CONCEPT OF ASTRONOMICAL INFORMATION NETWORK 

We now have an astronomical data-base ready, and it is accessible 
through a high-level language. How can such a collection of data be nade 
available to astronomers ? What are their information and documentation 
requirements ? These questions can only be answered within the 
astronomical community itself. 

With the development of large-scale numeric calculations (such as 
orbit computation and mathematical modeling) astronomers have become 
increasingly aware both of the potential usefulness and of the serious 
shortcomings of the computing machinery as an information processing 
tool. In 1969, the Astronomical Society of the Pacific expressed 
concern over the problems of data retrieval in astronomy and stated: 

The need for a means of recovering various data on individual stars, 
galaxies, clusters, or other objects is becoming obvious, and 
techniques for compiling, storing and distributing such data have 
been developed in recent years. 

Recommending that such techniques be surveyed, the Society appointed 
a Committee chaired by Dr. Helmut Abt, to assess their usefulness in the 
astronomical environment and " to estimate the needs, procedures and 
cost of such a center". 

In its December 1969 issue, the Society published a Special 
Announcement defining more exactly the role of an astronomical data 
center. Such a center would be 

a computerized storage, retrieval and distribution service for 
published data on stars, clusters, galaxies and perhaps other 
objects. 

Initially the catalogues available on cards or tape, both of 
optical and radio data, would be included: later new catalogues 
or bibliographies would be compiled. When practical the data 
itself will be listed. In other cases only bibliographies will 
be furnished. 

Such a center would provide two services upon request: 

1. Listed information and/or sources for specific objects, e.g., 
the published data on a given star or stars, and 

2. Derived data, e.g. the Southern F-type eclipsing stars in visual 
systems. (3) 

While this study was conducted, Stanford and Northwestern were 
engaged in an experiment designed to demonstrate the feasibility of a 
different concept: rather than proposing as desirable the creation of a 
single organization (the "data center") having the responsibility to 
acquire and maintain the data and to implement the necessary technology, 
we wanted to observe the behavior of professional astronomers placed in 
a 'network' situation. By this we mean that our users in Illinois could 
not only Interrogate a data-base in conversational mode but could also 
insert new files and update existing files in that data-base without any 
programmer intervention. 
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Vallee/Hynek - Figure 3 
Scientific Information Network 
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According to this concept, one visualizes a network such as the one 
shown on Figure 3, based on a small number of basic stations equipped 
with remote medium-speed printers. This network would be open on a 
subscription basis to additional users who would receive a simple data 
terminal connected with a scope. The main advantages of this concept 
a re : 

1. The adm i n i s trat ion of the network (A) and the center 
providing computing power and storage space O') do not 
have to be at the same location. 

2. Responsibility for the contents of the information in storage 
is shared by all the users who have update privileges, but 

A and B do not receive special treatment. 

3. Interrogation is conducted on a direct-access basis without 
any administrative or technical interference. 

4. The network offers a highly efficient communication system 
among all the users of the data-base. 

Although many people agree that this objective is desirable, the 
implementation of such a network raises technical problems from the 
point of view of system design. This activity has to be supported in a 
time-sharing environment, and several crucial parameters had never been 
previously measured. They included: The expected distribution of 
terminal sessions during the training period and during normal use of 
the facility; the statistical distribution of holding times; the level 
of interaction and growth rates of the files in the data-base. The 
experiment we are now going to describe was designed to measure these 
parameters . 



V. PROTOTYPE IMPLEMENTATION OF AM ASTRONOMICAL NETWORK 

An IBM 2741 communication terminal connected to Stanford's 300/67 was 
installed at Dearborn Observatory, North of Chicago, from April 2 to 
May 26, 1970. As the psychological reaction of ootential users was 
expected to he initially negative, there was no formal announcement of 
the experiment and use was restricted to graduate students and staff 
members who had expressed interest and willingness to participate in the 
experiment. Previous to the installation of the terminal, several basic 
astronomical catalogues had been converted to machine-readable form and 
stored as DIRAC files at Stanford. These included the Warsaw Catalogue 
of Supernovae shown on figure 2, which was punched cover- to-cover for 
the purpose of the study; an expanded version of the Yale bright Star 
Catal ogue,wi th a volume of approximately ten million bits; and the 
catalogue of Bright Oalaxies whose description is given on figure 4. 

In eight weeks, one hundred and twenty sessions were logged, and 
their main parameters were recorded, leading to the tine distribution of 
figure: 5. The distribution of holding times is displayed on figure 5. 
Eight astronomers were involved in the experiments. Only two of them had 
previous computer experience at' the FORTRAN level. None of them had ever 
been exposed to time-sharing. 

Figure 5 shows (dotted line) the typical distribution of sessions on 
the Stanford computer, excluding administration and system development. 
(Average of July 1st and July 10th, when the astronomical data-hase was 
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Figure 4 : A DIRAC Status Report: The Galaxy catalogue. 
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Vallee/Hynek - Figure 5 

COMPUTER LOCAL TIME (PALO ALTO) DaI,y Pallem of Acllvlly 




USER LOCAL TIME (CHICAGO) 




DIRAC Data-Base 
Other Lines 





Vallee/Hynek - Figure 6 
Distribution of Holding Times 
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From these figures, it is possible to make an educated guess of the 
expected additional load pi aced on a time-sharing system by a scientific 
documentation service of the type described, for locations with a known 
time difference to t fie computing facility. Of particular interest is the 
fact, reflected on figure figure u, that sessions tend to fall into two 
categories: one-time interrogation series lasting from 20 to 40 minutes, 
and long editing and report-generating sessions with a typical duration 
of 90 minutes. (In comparison, the average attach time per session for 
all terminals connected to Stanford's 3GU/G7, over a one-month period in 
late June and early July 19G9 was 17.5 minutes). Use of a display scope 
and a remote medium-speed printer would obviously have saved much of 
that time. However the editing time would have remained considerable, 
and we do recommend the use of some local storing facility, possibly 
using magnetic card devices, in future systems of this type. 

in terms of CPU time, we found that text-editor usage remained at a 
level of approximately one hour per month while DIRAC execution and file 
system access required three times as much. During a typical 'long' 
session of one hour and forty-seven minutes, involving tfie edition of a 
large catalogue, consultation among users by mu 1 1 i -term i na 1 exchange of 
messages, and the creation of two new files, text editor usage was 17.72 
seconds and DIRAC execution time was 40. 2G seconds. 

Je learned three major lessons from this experiment. 

First, we found that the additional load on the time-sharing system was 
quite acceptable even in simulated situations where the astronomical 
data-base was interrogated and updated simultaneously by two or three 
terminals in addition to the main station at Dearborn Observatory. Even 
when the system was running at its full capacity of GO terminals 
on-line, use of DIRAC on a number of these did not significantly degrade 
performance. This seems to show that a network such as the one 
hypothesi zed on figure 3 is quite within the state of the art in terms 
of operating system support, even when tiie environment includes batch 
and other time-sharing activities besides data retrieval. 

Second, we could not observe a 'training period' of any significant 
duration: the astronomers usually devoted their first session to 
learning the basic text-editor and DIRAC commands, and started doing 
meaningful research work on the next session. This was quite an 
unexpected result, and it caused the number of users to expand quite 
rapidly beyond our initially very cautious estimates. 

Finally, user acceptance was also better than we had anticipated once 
initial skepticism was overcome. The terminal proved to be a valuable 
teaching aid, and an astronomy class started to compile a galactic 
bibliography in DIRAC format; we had not anticipated this type of 
application. In another typical instance, an astronomer (Dr. Wray) 
interrogated the galaxy catalogue to extract a sub-list of all irregular 
galaxies for which radial velocities were not available, and used it 
that same day to prepare an observing schedule for the observatory's 
4Jinch telescope. This particular use of DIRAC exemplifies best the 
advantage that can be drawn from new computing techniques in a research 
environment; this particular question, naturally, COULD have been solved 
by hand; but in itself, the simple fact that this study has never been 
undertaken in spite of the fact that the catalogue in question had been 
on observatory shelves around the world for years, seems to indicate 
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that the psychological investment required to initiate interrogations of 
this tvpe is much lower under the network concept than it is under the 
constraints of a retrieval center operating by mail and, a fortiori, 
under the constraints of the isolated astronomer working with oencil and 
paper. 



VI. COST ANALYSIS 

Hi i s psychological factor is frequently overlooked when the problem 
of information retrieval is discussed among scientists. The question of 
cost is always posed - and rightly so ! - i n terms of the obvious cost 
of hardware, and the not-so-obv i ous cost of software; hut it is rarely 
balanced against the CURRENT costs of missing information, the cost of 
duplication of effort, the cost of slow and inadequate communication 
among researchers, the cost of the time lost in menial tasks by experts 
who should devote all their attention to data validation and high-level 
anal yses. Very frequently for instance, astronomers will say; 'The cost 
of maintaining the data about a particular star in a direct-access 
data-base cannot be justified when we do not know if this particular 
record will he accessed more than once a month'. The answer to this, of 
course, is that no astronomer really knows how frequently this 
particular record may be needed hy himself or his colleagues. No 
astronomer is even in a position to enumerate all the contexts in which 
the data about this star might he relevant, and no astronomer can answer 
the question: 'How much does it cost NOT TO HAVE the relevant 

information available when it is needed ?' 

Assuming now that wo work with users who have overcome this mental 
block and can grasp the usefulness of the machinery in the 
non-ma thsna t i ca 1 area, then the matter of cost-ef feet i veness can be 
settled in fairly simple terns. We have mentioned earlier the type of 
Inquiry the ASP proposed for the 'data center' they were considering. 
Presumably these inquiries would be sent through the mail, and the 
center would operate in a manner similar to MEDLARS . In contrast to 
this, we tried to evaluate the cost of extracting and printing a number 
of copies of a significant suhset of the Bright Star Catalogue from a 
remote ter linal. The problem is to print offline, on a high-speed 
printer, a catalogue of all F-type stars that are the primaries of a 
Southern visual double. The DIRAC command the astronomer types is: 

Delta < 0 AMD mul2 "’=0 AMD Speluml >= 300 AMD Speluml < 400 

This command reflects a standard astrononical coding scheme that we have 
descrihed elsewhere (G). The retrieval system found 192 such stars. 
Another command will now create an image of the catalogue under WYLBUR, 
the interactivetext-edi tor with which DIRAC interfaces. This list of 
stars, with their parameters, is then ava 1 1 ahl e for punching, editing, 
or printing off-line. It could also he typed at the terminal or 
displayed on a scope. We generated 25 copies of this special catalogue 
on a high-speed printer with a final unit cost of fifty cents. The total 
time spent at the terminal by the astronomer had been six minutes, and 
he had spend 1.4 minute of computer time. 
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The trick iri achieving this level of cost-effoctiveness, of course, 
is that no programmer was involve! in the interrogation. Jhile the ’data 
center' concept would presumably include a largo programming staff, and 
would maintain its own computer, we wanted to point out that a network 
could utilize an existing time-sharing service and that reliance on 
non-procedural programming, as 01 le (1) and others have often pointed 
out, opened the way to a new and much more economical type of 
user/ system interaction. 

Our main observation, then, throughout this experiment and i r. 
discussions with professional astronomers that wi 1 1 he reported 
elsewhere (7), is that time-sharing and generalized programming combine 
to provide a suitable environment for efficient and cos t-ef f :ct i ve 
scientific documentation systems. But at the sane time we found that the 
potential users of these systems were still largely unav/are of their 
feasibility. we cannot escape the conclusion that the development of 
information systems as research tools might ho delayed (with a 
correspondingly high expense due to duplication of effort and use of 
obsolescent software techniques in the mean tine) for psychological 
reasons. These reasons will only be removed when scientists become aware 
of the availability of very large-scale, very cost-effective and very 
reliable business-oriented networks later in this decade. 
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DATA CENTER OR I NFORMAT f DM NETWORK ? 



A SYMPOSIUM 



(Transcript of a meeting of the American Section of the International 
Astronomical Union Committee on Data Processing, Boulder, Colorado, 13 
June 1D70.) 



I: 



D ar tl c 1 nants : 



-ASt 
-'’Ixby 
-O'incombe 
- u ynek 
-Kte^fer 
M oore 
Neff 
-°omnn 
-SI tterl y 
-Strand 
-'■'ackerl Ing 



Dr. Helmut A.Abt, Kltt Peak National Observatory (Chairman) 

Miss Joan E.BIxby, H.S. Naval Observatory 

Dr. Raynor L.Duncombe, U.S. Naval Observatory 

Dr. J. Allen Hynek, Director, Dearborn Observatory 

Dr . L. J . KI ef f er. Joint Institute for Laboratory Astrophysics 

Dr.EIlIot Moore, New Mexico Institute of Mining ^ Technology 

Dr. John S.Neff, Dept. of Physics and Astronomy, Iowa State 

Dr. Nancy 0. Roman, N.A.S.A. Headquarters 

Dr. Charlotte Moore-Sl tterl y. National Bureau of Standards 
Dr.Kaj Aa. Strand, H.S. Naval Observatory 

Mr. Lloyd R. ''lacker 1 1 ng. Astronomy Department, Northwestern M. 
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Mames preceded by a minus signs are those of the participants 
who have returned an edited version of their comments. 
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DATA CENTER OR INFORMATION NETWORK ? A Symposium 



(After presentations by nr.Aht and Rr.Sltterly dealing with existing 
catalogues and with the feasibility of an astronomical data center, 

Mr. ' 'ackerl l ng describes the time-sharing experiment) 

- Abt : Well, maybe It's a good time to stop and let Lloyd take over, 
because he has tried exnerlments. Let me say one more thing, and that Is, 
th*s data center could be operated hy remote consoles, and this Is what 
i.lovd Is bringing up; because It's possible now to have remote consoles 
f or a central computer, have thes.’s consoles In various observatories and 
have essentially Instant access. An experiment along this line Is what 
Llovd has been talking about. 

- ''ackerllng : I would like to start by shifting concepts. He talk about 
exnerlments pointing to the development of a North-Amer lean astronomical 
data network, INFORMATION network if you will. The Northwestern 
"nlverslty department of Astronomy and the Stanford Computation Center 
have lust carried out a two-month joint exoerlment for the purpose of 
trvlng to evaluate In a preliminary sort of way how one goes about the 
creation and use of a large data-base by such a proposed network. 

He used the 3R0/B7 of the Stanford I'nlverslty Computation Center's 
Camnus Facility and we put on various catalogues. Created them. Updated 
them. And queried them. 1 have samples of the work which we did, which 1 
did lust on Monday. We put on the Reference Catalogue of Rrlght 
Galaxies, this Is the one we worked most with. We nut on the dearborn 
dbservatory version of the Rrlght Star Catalogue, we put on the Warsaw 
"rellmlnarv Suoernovae Catalogue, and also created various sample 
b 1 hi 1 ograoh Ic files and worked a bit with them. 

I would like to tell vou the scenario we have In mind for the 
creation of such a network. One might think of the establishment, over a 
period like two years or so, of a sizeable data-base, and one might 
thin 1 *, of the creation of the data-base being carried out by those people 
who nre rather small In number, who are right now, currently maintaining 
and updating sizeable files of astronomical data, the double star files, 
all the files of the Naval Observatory, de Vaucouleurs' galaxy file... 
one can think o^ several examples. And, at the end of an Initial period 
when a data-base of significant proportions has been established, one 
thinks of making the network hardware and fully-developed software 
available to use by people at large, and essentially the entire 
Morth-Amer I can astronomical community. 

1 would like to give you Just a few remarks on the concept of the 
network, 'de want to stress the NETWORK concept, with the eventual active 
participation of the entire accessible population of astronomers, as 
opposed to the concept of the single data center, the single, large data 
CENTER, which Is taking the responsibility for: acquiring, organizing, 
validating, verifying, and updating the wide variety of astronomical 
data files, and also on top of that, the responsibility for supporting 
the hardware and developing the sophisticated software technology which 
will be necessary to make this service available to the remote users. 

And In this connection It might be Instructive to have the testimony of 
the people who are already In the game as to the feasibility of such a 
center, data center. There are data centers In existence, not In 
astronomv - thinking of other fields. And some people, more 
knowledgeable than I, think that there has not been created a data 
Q center which has worked. 
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(Some existing data center experiments, like hfdL'A^S an d 
cpiof^/ralLOTS, are critically discussed). 

- u ackerltng : Another point Is that the concept of regional computing 

networks Is one which Is - making use of a large computing facility - Is 
one which Is beginning to he acted on right now In the I'.g. You have the 
"nlverslty of California going on a state-wide net, well. It has plans 
to go on a state-wide net, and I think there are plans In other states 
to go on state-wide computing nets. 

- M*»ff : Iowa has had one running for two years and It Is verv successful. 

- n uncomhe : Morth Carolina has had one for at least a year. 

- Wackerllng : And I think perhaps the next ten years will see great 

changes on network lines, and perhaps great changes In Individual 
connotation centers as we now know them, because there Isn't enough 
talent to go around In maintaining the computation centers, but 
anvway. . . 

And finally I wonder If you would like to hear described the software 
used. I'll try to do that quickly. The concept, then, that we have In 
mind. Is the establishment of a large data-base at a central computer 
Installation and Its use In an Interactive mode In a time-sharing 
environment, from remote locations, by anyone Interested. 

- °onan : Can you be a little more specific as to what you have In 

mind ? | understand all the words, but I don't understand... 

- ’’ackerllng : Hay I tell you what we used ? 

- Aht : Can you give an example ? 

- Hvnek : Mo, what she means Is: It's Interactive: what do you mean. 

Interactive ? 

- ’-'ackerllng : I mean you sit down at a console and you talk to a 

computer. 

- 'lynek : And also point out the fact that the expert In charge at the 

M aval Observatory for Instance, or whatever their responsibility Is, 
could update the data-bank, but only that responsible center could. 

- ’’ackerllng : I wanted to coma to that. 

- Hvnek : Well, fine. 

- °oman : You say you Interact with It. I don't understand In what 

wav you Interact. You ask It for data, and you get data, and then 
presumably you consider this Is not really what I want. Is that what you 
mean hy Interacting ? Or you ask It for data and you don't like It, and 
vou tell It: the data you have In there Is wrong, vou should replace It 
by thus and so. 

- ’-lackerllng : Yes. 

- Mynek : That's right. 

- n oman : 1 would consider that somewhat dangerous. 

- '’ackerllng : You've got to have update access to It. 
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- Abt : AH you're doing Is entering In a list to the expert, and the 

next tine he looks at that particular group of things that has to he 
none, vou have given him a comment that says: 'Look at the data for 'ID 
1554, It's wrong. ' And he'll look, and re-evaluate It. 

1 ’acker 1 1 ng : Rut one has the picture of the double-star expert ... 

- 5t rand : There we have the problem from the very beginning. D eonle 

will think that when thev find a mistake they can Immediately correct 
the data In the bank. In regard to the Double Star Catalogs, we have 
Insisted on people mailing us their corrections and these would not he 
Incorporated until we had an opportunity to verify them. 



- l, vnek : Exactly. Without that vou ' re opening Pandora's box. It's 

lust a chance o p all kinds of errors coming In. 

- M e*f : The security of your system, that puts some constraints on your 
hardware. 

- "ackerllng : The system In existence provides three levels of access 

to the data-base, which I'll come to shortly. So anyway, one envisions 
at every remote location a terminal and a scope for extraction of data. 
The machine we run on I s a SK0/R7. |t uses ns/3fiO-HASP as an operating 
system, "e use the Stanford time-sharing submonitor, which Is named 

, a"' 1 the submonltor is relied on by the language which we use, 
which Is called RIRAC. n | RAC was developed hy Dr. Jacques Vallee, who was 
at "orthwestern, he Is currently manager of Information systems at 
^tan^ord. R| D AC Interfaces with and Is driven by Stanford's Interactive 
text-editor, which Is named "YLBIJR, and you also have available a 
terminal -to-terml nal communication under a system named MlLTEN, that Is, 
anv one signed on to the network can talk to anyone else signed on to 
the network. v ou also have the use of a language named ALTAI R, which has 
been described by M V nek and Vallee In D . A . R . n . some years ago, which 
answers questions put to It In a kind of English, 1 know you would like 
to say English'... It's approximately English. This answers questions 
on properties o* stars, bright stars at the moment. 

- 9 : Ran you give an example of conversation ? 

- Abt : You can Just quote It. 

- "vnek : If vou want Just an example of question, vou could certainly 

ask: Rf all stars brighter than a certain magnitude, and earlier than 

^5, how many are members of visual binary systems whose space velocities 
are greater than fifty kilometers oer second ?' You could do that. 

- ? : Would you ask the question In exactly that form ? 

- Wynek : .lust about. 

- ''’ackerllng : And ALTAI 1 * would come back and tell you If It didn't 

recognise any of the words. 

- Wynek : 1 forget now whether you would use the word 'double' or 

'blnarv', but obviously... 

- Abt : It would be possible to have a language such that the computer 

would recognize the word 'double' or 'binary'. 

ErJc Mvnek : That ' s a 1ater but In ALTAIR It didn't. If you said 
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'binary' and !t only knew the word 'double', then It would cone beck and 
say : WWW UNKNOWN. 

- Aht : OK, so you talk to It In Fngllsh. 

- ,J vnek : You talk to It In English. 

- Abt : And tt replies In English. 

- u vnek : The whole point, however. In an advanced system. Is 

essentially not to have the astronomer relv on a programmer to phrase the 
ouestlon. He gets down there and types out the thing, and the system 
essentially CONSTRUCTS THF PROGRAM to answer that ouestlon, 

- Aht : Because It's a type of question vou ask that Is fundamentally 

so slmole; because you're stating a magnitude or a distance, and you're 
asking Is It greater than, or less than... 

- Hvnek : v es, the M T AIP system gave you numbers an^ ratios and 

oercentages, things like that. If you wanted to prepare an observing 
Mst pe things, vou Just made out the... It spoils It, In a sense ! 

- "aokerllng : In the example I showed you when I loaded 01 RAC, I said 

I wanted to query the galaxy file, and then I asked It - this was, 
incidentally, Vaucoul eurs ' reference catal ogue of Bright Galaxies - I 
asked It for all the galaxies for which a v erkes type existed. In one 

o f "organ's two lists, ft extracted these; I said, 'retain that sample, 

I want to ask more questions', I then asked for all the A or F-tvpe 
galaxies of Irregular form, and It picked up those, and I then said, for 
how many does a radial velocity not exist, and it picked no those, then 
I said, 'print me out the v erkes types', and whatever data I wanted to 
extract, I forget what they were, but this Is the kind of thing that one 
can do, I have more examples with me, which I'll show vou. 

- ? : How long did all this take ? 

- "aokerllng : Oh, that particular one, that took part of an afternoon. 

- Hvnek : Well, for the whole thing, but the actual scanning ? 

- ’'aokerllng : Oh, the galaxy catalogue Is a rather sizable file. The 

time It takes to scan It varies depending on how the current use Is on 
the machine, but typically, during periods of light use, perhaps thr^e 
minutes to scan the file, Ourlng periods of heavy use In the p aofflc 
afternoon. It took, oh, perhaps twice as much. 

(Machine size and storage capacity are discussed, as well 
as the cost of peripheral hardware). 

- Meff : Let me ask you. If I wanted to ask about a star that Is known as 

Alpha l.yrae, how would I ask It ? 

- '‘lackerllng : There Is one thing that I didn't say and that Helmut 

brought up. File number ONE should be a file of names of astronomical 
ob|ects. That's something we can start to work on today, you go Into the 
file, you say the name field contains 'Alpha Lyrae', and, bang ! You get 
the names of Alpha l.yrae, and number 2 gives you the name that you would 
use In querying the system. 

- ? ; But It (Increases the size ?) of your little system. 
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- "ackerllng : I created a sample one from the Messier Catalogue, 

- Abt : This Is what 1 mean. For Instance, In looking at the old 
literature, you find a lot of l.alande numhers, which are no longer use'*, 
°ne o* the first things you have to do Is cross-reference all these 
catal ogues . 

- Strand : And how about "olf numhers ? 

- Abt : They should he cross-referenced too.. 

- Mfiff : Have you been trying to look up Wolf number 59 ? (laughter) 

- Aht : If you wanted Identification, vou might automatically list the 
common designations for stars, such as the '**> number., the RH or CH number, 
eon number, the HD number, the name and oerhaps the RC number. A more 
extensive list, such as ’Wlson and ’lolf and °oss numbers, could be 
printed bv request. 

- ^trand : That's a big file right there, the Identifications alone ! 

"htch, Incidentally, I would like to have a copy of. 

- M vnek : Maybe we should start... 

- Poman : v ou need some of the old things. 

- Aht : So that, anything you do along these lines, IF you only stopped 

at Intercomoarlng the catalogues, and printed out what Mount Wilson 
number corresponds to what R n number, you've alreadv done something 
useful . 

- ''ackerllng : Somebody mentioned the organisation ... 

- strand • It's going to create a big center hut not very many great 
astronomers, you know. T he astronomers are going to rely on centers for 
evervthl ng. 

- "ackerllng : The astronomers will be relying on themselves to 

maintain the data-base. 

- "vnek : The whole Is not greater than some of Its parts ! 

- °onan : It seems to me that all you've been talking about Is how 

vou'd have a nice, convenient access to the data center, versus sending 
a question and getting back an answer some time later. I don't see how 
much vou' re going to get out of the data center unless you have data put 
In f 1 rst. 

- "ackerllng : It's the PFO D LF who make the data center. I've been 
talking the way I do because of things I've been hearing and the things 
I've been reading, especially the COSATI reprint. 

- Poman : I don't understand how you avoid the data center In your 

system. 

- '^ackerllng : The data center Is a machine. It's not a staff of people. 
That's the difference. 

Somebody has to put the Information In. 
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- ’ : Somewhere on this computer there has to be a set of disk packs, 

or If they are not on somebody has to put them on. 

- Hvnek : Let's suppose that Observatory 0 Is responsible for., double 

stars, let's say. T hey're a staff of professional people located In 
Oshkosh or some place. T hev are the ones who are responsible, and only 
thev, for updating this thin/?. Mow, on their terminal they create the 
catalogue at the data center, but the center Is not people. It Is a 
machine-. In other words, the double star experts don't have to be 
housed In this center. T hat would get terribly top-heavy. 

- °oman : That's a trivial problem. 

- Hvnek : What ? To have all the double star experts, all the galaxy 

“XPerts, al 1 the. . . 

- foman : Mo, where vou house your experts Is a trivial problem. 

v nu have to have a central file of Information. 

- ,J vnek : That's the machine, the disk. 

- r>nnan . Rut how do you compile your data ? 

- Hvnek : That's the resnons 1 hi 1 1 1 v of the people who have that. 
Suppose the JAM decides that astronomer Z and his i^roiio are responsible 
for radio-sources. Then only the people responsible, given that 
>-®snoos 1 b 1 1 1 1 v, have the right and access to the central data-bank. And 

they. . . 



- "ackerllng : In update. 

- Hynek : In updating It, yes. In that category. They could not come In 

and mess up, say, vour double star catalogue. 

- ’ ; What happens If the observatory responsible for updating the 
double star catalogue decides. they don't want to do ft ? 

- Ah t : That has happened In the past, but vou give It to another 

observatory, such as the transfer of the Jeffers double-star catalogue 
from the Lick Observatory to the ".S.Maval Observatory. 

- Hynek (To Miss Roman): I think It's a matter of semantics, 

Mancv, really, what you mean by 'Center' here. 

- Abt : I would like to open a big discussion. I detect there Is a lot 

of questions here you'd like to ask. 

- ft rand ; I think you should allow Or. buncombe to say a few words 
here, since we have had such a center longer than most. 
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- Wvnek ! 

- Abt : 
comments 



That's true. 



All right, 
he'd 1 Ike. 



may I Interrupt this and ask fay to make any 



- Ouncomhe : Thank 
the last few years, 
data. We have tried 



you, Helmut. We've had some experience In 



over 

of 

form. 



this. 

In the role of a data center In certain aspects 
to assemble star catalogues In machine-readable 
these data being distributed In Europe by the Astronomical Pechen- I nst I tute 
of Heidelberg, Germany, and by Her Majesty's Mautlcal Almanac Office, 
Sussex, England. In cooperation with COSPAR, we hope to establish a data 
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exchange canter, probably located In Paris, for ephemerldes for space 
research, star catalogues, and observational data generated by the 
various laboratories Involved. 

- Abt : You told me, I think, of two, I thought they were seoarate. 

bne was a catalogue of catalogues available on tape or cards, and the 
other Is a cooDeratlon between I AM CorrmTss l on 4 and CDSPAP for a 
Committee of which you’re Chatman, formed to consMer a catalogue for 
sDace observation, which Is mostly short-lived objects. Is that wrong ? 

- buncombe : I think something slipped In the Interpretation. 

- Abt : Then there Is another thing here. In which there Is a proDosal 

*or a center In Paris which would direct Interested peoole to where data 
can be obtained. 

- buncombe : This Is exactly the point I'm speaking to. 

- Abt : but, this center would not necessarily store the data Itself. 

- buncombe : This Is predselv the point. 

- Abt : Alright. 

- buncombe : This Center would act as an Information center to direct 

users to t h e data groups that are generating the data, or that already 
have the data available. In other words, the group, the office In Paris, 
would MbT store all the data. It would merely act as a clearinghouse for 
renuests for data, or requests for obse^vat ? ons and refer them directly 
to the groups who have the expertise In either making the observations 
or analyzing them, and so forth. 

- Abt : Mow does this differ from |ust an updated version of vour 

PI rcuiar 114 ? 

- buncomhp : |n Circular 114 we refer only to the data that M.S. Naval 

bbservatorv has. 

- Abt : bh. Put It could be expanded to say that these are the known 

catalogues available on cards or tape, and they can be obtained from - 
a given observatory. 

- buncombe : Exactly. 

“ "vnek : How about compatibility of such data ? Suppose I write to 

Paris and want a certain catalogue, bon't I have to have the proper 
computing machine so that the data, the format Is compatible ? Or can 
that all he stralghtenel out ? 

- buncombe : You get a format of the data right along with the data. 

(Compat Ibll l tv and exportablllty of data are discussed) 

- Abt : I could see difficulties. In the sense that Identification may 

be different, you mav run Into Identification problems. 

- buncombe : The point I'm trying to make Is this. We refer people to 

the particular group that has expertise In that area. Here you are 
assuming that all of those various groups are going to Introduce their 
data Into one large data-hank, for TOM, and MAINTAIN ft for you, so that 
people can go to one place to get the data. Put vou're making more work 
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*or the people who are the exnerts In various areas. In nutting the 
burden ,.«n then to keep vour data-bank uo-to-date. It becomes their 
responsibility, not yours. 

- M vnek : All right, don't thev have that burden right now ? 

- '-'ackerllng : They have It already. 

- buncombe : They have this burden right now but to a lfnfted extent. 

’■'hen soneone asks me to send a copv of some particular catalogue, I 

I tell then, "All right, I have a version, and 1 know there are errors 
In It, hut you're welcome to It under those circumstances". How are we 
going to warn the user of this data hank ? He goes to that and he assumes 
he Is getting the most uo-to-date, correct Information available, 

- 'ivnek : That would also have to be Indicated. 

- buncombe : Well, all right, this Is a variation on what you proposed. 

- u ynek : Yes, o* course I should point out, as l.loyd pointed out, that 

this thing works on one continent, the M orth American continent, I don't 
see how vou 1 d have a terminal In n arls connected to a universal data 
bank some place. 

- n uncomhe : Oh no, this Is perfectly true, but this Is an oblectlon that 
might be raised. 

- ^crand : The process of verifying the data will be slow. Ry the time 

this is done the user might no longer he Interested. 

- D oman : I Just want a clarification: Is this a place where you 
could send a list of a hundred stars and, sav, they give you the 
nhotometrlc observations of these stars, or would they only give you all 
the olaces that had photometric data ? 

- huncomhe : I believe that Initially, all they will do Is tell you 

where the photometric data Is. 

- » n m- n : Which Is not really very much help. 

- buncombe : That's the problem. 

- n oman (laughing) : As you know ! 

- ritmcomhe : The second objective, I believe, of this 

assemblage of data. In one place. Is that the user can auery It and get 
all o-R the photometric data on a particular object. 

- Aht : Mow, what you've been talking about and what Lloyd has been 

talking about are complementary, because Lloyd has been talking mostly 
about this, and you've been... 

- buncombe : Yes, but let me proceed: 1 see no difference as far as 
objective one Is concerned, between what you're proposing and WHAT WF DO 
unw among centers of exoertlse... 

- Hvnek : TIMF Is perhaps the difference. 

- buncombe : Fxcept that vou're putting a grsatpr harden on the people 

who supply the corrections of all the material. In order to keep your 
data-bank up-to-date. v our second objective to have all of the data 
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available for Interpretation Is excellent, but how much 
Is It goln g to cost you ? How much Hoes It cost vou, for Instance, to 
keep one Item of Information on a star. In storage, and have It 
accessed, sav once every two years ? How much Is It going to cost vou 
to retrieve that ONE BIT n f Information, and how much does It cost you 
to analyze data ? I have had HORRENDOUS experience running through 
files o f star catalogues to try to find differs t designations for the 
same star, among star catalogues. It takes a TFdRIFIC amount of time to 
-*o this tvoe of analysis. 

- Abt : rtnce you've done It for a star, why should somebody else have 

to do ft for the same star again In three months? There Is a question 
In the back . . . 

- Kleffer : I don't want to Interrupt, I'm somewhat of an outsider, 

but I have the distinct feeling that you're all going down a very 
strange oath. Vou're very concerned about computers and computer 

use, and vet I don't think that you really know exactly what you want to do 
with them. l et. me say this about that sort of problem. I operate a center 
which has gone down this road In a modest way, and there are three serfou 
oroblems. "n to this point, as far as I know, large data-base systems have al 
failed. T he reason that thev have Called (these are not my particular 
reasons) are first of all, goals are not well-defined and they change. 

Second, changes In hardware, and finally changes In software. M ow, ell 
three of these things seem to combine, to absolutely SINK large 
data-base systems. These problems should be carefully analyzed for the 
system vou're proposing. If vou really understand what the basic 
Information Is that you have stored In the data-base, and exactly what people 
want to know about It, It's absolutely MUCH RETTFR to print It out and 
send it to people rather than have every one continually query your 
computer. It's EXTREMELY expensive. My understanding Is that you ar» 
operating on very modest budgets. You ought to recognize that from the 
beginning, and not spin your wheels concerning very sophisticated 
hardware and software. I have a very good Idea of what software costs, 
because we have to generate a lot of our own, for a much simpler system. 

- Abt : Yes, that Is a good point. In defense, I can say the 

astronomers have been outtlng together catalogues for hundreds of years, 
and they have a lot of experience, and It's not necessarily expensive. I 
could guess how how much It costs... 

(hlscusslon o f the amount of work Involved In 
catalogue preoaratlon. Or. Abt points out It has not 
been a very large expense In the past) 

_ v\ r>ffp r ; | didn't say that, ''hat I'm saying Is that l f vou really 

'■/ant to use a sophisticated computer system, vou really are making an 
order-of-magnl tude Jump from the kind of thing you're talking about, 
which Is, I think, still the best wav to do things. In manv ways... 

- Abt : You're saving, this Is expensive. 

- Kleffer : No, I'm saying that Instead of producing a hard-bound 

copy OMCF for people, you have this data all stored, either on 
disk, tape or whatever, and continually search that to get 

little bits of Information, that ' s ■ someth Ing that Is orders of magnitude 
more expensive, and that's what I gather we have been talking about. 




(The cost of the Stanford/Northwestern experiment Is discussed, 
but the cost analysis was not yet available. It Is presented helow 
In the section 'Analysis of an experiment with direct-access 
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- Hvnek : Well, the gentleman In the hack of the room said that even 

before a hard cover Is out. It's out of date, and vou can't uodate. 

n o vou know o* anv catalogue that Is perfect ? And doesn't need updating 
as soon as It Is out ? 

- '’oman : Yes, hut vou 're going to have that trouble anvway. 

- 0 : Continually. You're continually updating catalogues. 

- Mynek : It's a lot easier to update the data hank than It Is to print 

a new catalogue. 

- Aht : I could see... 

- Strand : We're going a wav from the subject, of how much It Is going 

to cost, this Is what I'd like to know RIGHT NOW. 

- M vnek : Well, we don't know the answer to that as vet. 

- ctrand : ''(ell, then don't propose something unless you want to have 

it, unless vou have some kind of an answer of what the cost will be. 

hn the basis of this, let astronomy In the "nlted States decide whether 
*'ge should spend, for Instance, two million dollars on such an 
astronomical data network, or whether the money should be spent on 
research. T hls Is what It comes down to. 

- "vnek : You're absolutely right, this Is what It does come down to. 

- p >uncombe : This Is what It really comes down to. 

- Mvnek : And I think that Is where the philosophy of this thing has to 

come In. MSF Is supporting a fantastic amount of material for data 
GATMF n !MG. Mow, shouldn't there also be an A>'PA for the data processing 
and data use ? I mean, what's the oolnt of Putting millions o* dollars 
into getting more and more and more data. If It becomes Increasingly 
difficult to use It ? This Is what we are faced with. 

- ? : You would have to demonstrate that that's the case. You have to 

demonstrate that a typical astronomer Is missing data he should use In 
solving whatever the problem Is... 

- Aht : Well, I can think of a very good example of a problem which 

suddenly became urgent about fifteen years ago and the data center would 
have answered Instantly or In a matter of minutes, and that Is: what 
ceohefds are known to be In open clusters ? Because the realization 
that there are cephelds In open clusters had quite an Impact on 
astronomy about fifteen years ago, and that could very easily have been 
answe red. 

- Roman : That probably Is still not beyond the reasonable 

capability of answering by hand with an existing catalogue. 

- Abt : Well, It was done by hand, but It took months to do. 

- R oman : A more serious question Is to take a list of eighth 
magnitude stars and look for their spectral types or their photometry... 

- Mvnek : Hr their space mot’ons. 





- Abt : Can anybody give an example of really nrge.it oroblems that may 

have a big Imoact on astronomy hut cannot be done by hand, probably like 
asking the number of cephelds In clusters. Something you really have to 
do with a comouter ? 

(The discussion turns to other retrieval systems) 

- Abt : ’’ell, the astronomers of the world now turn out something like 

1100 and mavbe 2000 papers oer vear. It's Increasing over time. Mow many 
o* those things, how manv o^ these papers can any one person read, or 
scan, or even remember, and even In a relatively narrow part of 
astronomy ? 

- ? : T hat's what the AJR (Astronom? sches .lahresbe r Icht . Ed.) Center... 

- Abt : v es, but the AJR Is very poorly set up to recover all the data, 

because you have to look In every single volume. I f you want to find 
out what's known about MD-so-and-so, do you want to sit down and look 
through seventy volumes the .lahresberlcht ? 

- nfxbv : When you consider the cost of maintaining a comouter terminal 

which you're perhaps using once a month to ask about 

un-someth l ng-or-other, as compared to the cost of sitting down with the 
Jahresberlcht, there is lust no comparison ! 

- Abt : Rut the point Is, oeoole don't do It, and therefore the... 

- Rlxhv : Rut they do 1 t ! 

- Aht : They do It ? Well, yes. In part, hut It's more likely that 

thev overlook oast things. And I think our research wll 1 gradually 
deteriorate because people will not he aware, for Instance, of the fact 
that when they work on a star. It may be a double star system and 
therefore may have a composite spectrum. 

- Roman : The main thing Is, you can't find out what's known on 

,jr >-someth! ng-oi — other unless HD-someth 1 ng-oi — other was sufficiently 
important to have a paper all to Itself. If It was just one of a hundred 
stars in the oaper you're not going to find it In the .lahresher I cht . 

- Mvnek : Yes, that Is a good point. 

- Aht : Ray, vou...? 

- buncombe : I was about to say: THERE IS NO SMRSTITt’TE FOR DILIGENCE 
AMP f MINISTRY. 

- M V nek : Hear, hear ! 

- buncombe : Even your proposed astronomical data hank won't do It. 

- Sltterly : The EXPERT cannot he moved out. I think this Is very 
Imnortant In all your plans. ^Is careful planning Is required for the 
success of any data center. Reliability of data given out to an 
Inquirer Is the Important factor. 

-Mvnek : Oh no, you're absolutely right, you don't... Rut the point 

Is: Is astronomy going to he left In the rear, and the medical 
oro*esslon Is doing this, the legal profession Is... 

- Sltterly : Rut thev aren't successful. 



- Mvnels : Maybe at the present, all right; now does that mean that ten 

years f rom now they won't he doing It successfully ? Maybe we should 
profit ^rom their mistakes ! 

- Aot : ho von think It's too early to do this ? 

- 'Wtterlv : I do. More groundwork Is needed. 

- Abt ; Vou think we should wait ten years or so ? 

- Mp'Ff ; | think a minimum thing that could be started. In which I take 

there was a consensus, would be a star Index: lust a simple-minded star 
Index . 

- Abt : Like Lloyd In Illinois starts... 

- M e ** . you know, where you have all the names... 

- "vnek ; All the names. 

- Me** : All the names o* the stars known, and what Is the In*ormation 

that's ava liable. 

- R nnari : And right ascension and declination. 

- "vnek : yes. 

- Abt : Mow far would you go ? 

- Me** : Wei 1 , anv star that has a name ought to be In It. 

- D oman : Marne or number. 

- Strand : We have names of stars as faint as twenty-first, magnitude. 

- 9 : I f anybody has bothered to name It, It should be In the Index. 

- Meff : Maybe I'm not posing the question correctly, hut It seems to me 
there was a consensus that a star Index of some form would he valuable. 

- "vnek : Yes, but It's one o f these things where there's a hlg ditch, 

vou can't I umo a quarter of the way. You have to make the total jump, 
and you either have a total data system that will be used for all sorts 
o* things, or vou don't have a data system. 

- ^ 1 xhy : Can't you start small, bv having, say, a continuation of what's 

already done with star catalogs In a more organized form: fine group Is 
working with star catalogues, and another group Is working with spectral 
tvpes...° 

- "vnek : That's exactly what we've proposed. AND WF WANT TO CONNECT 

tmfm gv a MFTWnPK. 

- R lxbv : Yes, but this... 

- D oman i It would encompass the whole world ? 

- "vnek : No, no, no, well... 

- Strand : You can do us a favor. We need Identifications of stars from 
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various catalogues, and an astrographlc catalogue, would vou like to do 
that for us ? 

- ,J vne!< : Why, sure ! 

- Strand : That would save us a lot of money. 

- Aht : That’s a big problem, because different catalogues have 

nosltlons that are not the same equinox. You can't just take a 
coincidence, you have to give a tolerance to the coincidence, and even 
then It's not reliable... (laughter) 

- strand : We know all about that. We have discussed It at the Maryland 

Astrometric Conference. 

- Meff : I think that 1 would like to propose that we start with something 

we can do, oreferahlv something small, as a pilot project, then, keeo a 
careful accounting of what It costs... 

- Ah t : Yes hut, John, can you be a little more specific, because we 

lust had a ollot project. 

(Conf us 1 on ) 

- n.incombfi : Look, I’m not trying to sledgehammer this, all I will say 

Is, vou can make a computer do anything. It's how much It costs you to 
'lo It. 

- Ah t : And also, how much do you gain out of 1 1 ? 

- n uncombe : Just get the balance: The heneflt to you, against how much 
It’s costing, and how manv other demands there are for the same dollar. 

Mow much research.. 

- n oman : And the same expertise. 

- Aht : Mow, that will come In the proposal, because presumably there 

"rill he a orooosal from Hvnek an^ '/acker 1 1 ng and Jacques Vallee or 
somebody else, and thev will estimate costs. T hen the NSF or MASA or 
whoever Is the recipient of this proposal will have to decide: Do we 
want to spend a certain fraction of money to do this, or fund 

some other projects. Rut, the only thing that we should discuss Is the 
capabilities and astronomical possibilities of an astronomical data center, 
how Important they are. Ro, If you have anv suggestions, some 
additional ones. I’d be glad to hear them, because this Committee Is 
going to terminate, and it will depend on the Initiative of somebody to 
dp the next thing. Any other comments ? Well, It’s time to stop. 

T hank’s verv much for your time. 

(Manv oeoole leave the room. The discussion continues among a 
small group) 



- Aht : We’ve learned two things here: You've got to get some cost 

estimates, and the other Is, vou have to solve this problem yet. 

- M ackerllng : 1 disagree, we certainly do have In mind that problem 
exactly. That's one thing 1 wanted to remark, there Is a catalogue... 

- Aht : Like the planetary nehulae 9 
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- u ackerl 1 mj : That's the format, that's very much the format that one 

thinks o^. I dM Indeed type a samole of the planetary nebulae 
catalogue, to see what It looked like. 

(hlscusslon o f budgetary levels In Astronomy) 

- Mynek : Well, the point Is, It's like the old p !llsbury Flour 

advertisement: "Eventually... why not now Eventually, some much more 
evident way has got to be found to handle the astronomical data. Mow, 
this mav not he the wav to do It, but anyway we've started thinking 
about It, that's the Important point. Somebody will come up with a 
real 1 v. . . 

- M ackerllng : Trouble Is, there are needs for this tvpe of thing right 

now. 

- ,J vnek : See, If we have a console at the Observatory, and I want to 

get a certain star, I can get the binder field with precessed positions. 
v ou wouldn't want to have a computer Just for that, but... 

- n ?,xby : It would be nice... I wouldn't see that vou would use It just 

f nr access... 

- "vnek : You probablv wouldn't be using It as much as some of the 
oeoole at the "nlversltv of ’Msconsln. It could be that some of the 
people who are generating the data In that particular field would not 
need »t, because they got It there. 

- D - 1 x h v : ves. 

- 0 : Mow 1* you suddenly develop an Interest In radio-sources, now you 

might want to have access to... 

- p I xh v : In some ways It might seem useful to copv the entire 

catalog and to work with your own computer. It might be more 
economically feasible and easier to adapt to a particular application 
than to trv to work th ough a console. 

- Mvnek : On the other hand, that thing may get tr» the point where 

It was cheap to have It. 

discussion hardware available at Maval Observatory) 

- g vnek : Well of course, she brings out an Important thing, that they 

are specialized, so specialized In their d at” that they are the ones who 
are gathering It and t'glMG It, they don't have any need to go and query 
a data bank for data they've already got. 

- Wackerllng : Right. Rut one envisions the Naval Observatory as 

being a 'participant' In the final... 

- n !xby : The way things are run at the moment, we are generating data, 

and we are giving people copies. There have been times when It has been 
a burden on people to get a corrected copy of the data requested 
prepared for someone. This makes me wonder If a researcher would be willing 
to stop for three or four months to help set up a data bank. That's three 
or four months lost from the research. 

- "ackerllng : That's right, but people are doing this. Arne Rlettebak 

does maintain a f lle of stellar rotational velocities, you maintain 
hundreds of records, and If this was presented to you as a tool, and It 
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were useful to you In your data compilation and your data updating, 
this Is how one would hone to make It useful to v OIJ, you see 0 

- R !xhy • Yes. 

- ’'acker! Ins : But, as I say, we have to make our case very much more 

clear. 

- B ! xh v ; Also, there Is a certain element of trust that 

vou hav" to have In the system. Bo far when presented with compilations 
of data most people generally try to do some checking or COMPLETELY 
check to see that that’s a complete list and, well, many times you find 
't’s not, and this kind of a svstem... 

- '‘lacker! I ng : Rut on the other hand, everyone does take Arne 
Blettehak’s word that this Is Indeed the best stellar rotational velocity 
for blah blah blah, and If It’s certified hy A.Rletterbak November 10th, 
1071, one oerhaos trusts the data.. 

- u vnek : Rut this Is something that covers the whole waterfront, I 

mean no ma t ter. . . 

- "acker! Ing : This Is the oofnc, you need CRITICAL data evaluation, a 

*lle of critically evaluated data. 

- lJ vnek : Take anv catalogue, take that supernovae catalogue that came 

”rnn '-'arsaw, how do we know what Its critical accuracy Is ? 

D ut 1* we were In closer common 1 cat ' on with them, we could find out and 
test ft, an-i check ft. And heaven knows, even In the Bright Btar 
Catalogue ... ’>ell, we'll see what the problem will be... Can vou drop 
me over at the Travel odge ? 
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AMAL V S!S OP AM F X 0 F P f MEMT 

" 1 TM n|r?FCT-ACCERS ASTRONOMICAL CATALODIIES 

.J.''al lee 



T hls report discusses the technical and economical aspects of an 
experiment with a prototype scientific network, which has heen described 
above In terms of the contents of Its data base and Its user Interface. 

The experiment Itself took place from Aorll 2 to May 7R, 1970. An IBM 
°741 Communications T ermlnal owned bv the Stanford Computation Center 
was shinned to Evanston and Installed at the Dearborn Observatory for 
use with a dataohone coiioler. Three major catalogues were created and 
stored on direct-access disk oacks at Stanford. The ''arsaw catalogue of 
suoernovae was converted to mach f ne- readahl e form In DIRAC 
1 ahel 1 ed- l nout format for the ourpose of the experiment. The Dearborn 
observatory version of the Rrlght ^tar Catalogue was created directly 
from the existing tape through DlRAC's positional Input processor, and 
so was the Reference Catalogue of Bright Galaxies obtained from the. 
"nlversltv o f Texas. Fmohasls was then Placed on the creation of 
aooroxlmatel v ten more exoerlmental flies reflecting the personal 
interests o f the users: These Included a M essler catalogue, a galactic 
b T hi *og raohv, etc. 

Throughout the experiment, a careful account was kept of the 
following data: 

1. The parameters of each session. 2. The computer time and service 
charges, hrnken down by category. 3. The teleohone line charges. 4. 

The various other exoenses such as travel and training. 

T he object of this effort was to Identify qualitatively and 
quantitatively the parameters of the Interaction between a group of 
scientists, non-programmers, located far away from a computing center, 
and a data-base with which thev could only communicate through the 
generalized software Itself. 

It was of Particular Interest to observe the distribution of the 

sessions and the nature of the work accomplished; to analyze the 

distribution of holding times; to study the causes of the failures that 

occurred; to break down the cost of the experiment according to Its main 

components; and finally to try and form a picture of the economics of a 
f ul 1 v- ?mol emen ted network. We propose to study these Items In turn. 



I. 4NALYSIS OF THF MAN/SYSTE M INTERACTION. 

A large time-sharing system, such as the one running at Stanford, 
develoos a well-defined pattern of behavior as a function of time, 
number of users and mix of Jobs In Its various partitions. Before the 
development of 0 | RAC earlv In 1970, Information retrieval applications 
bad onlv minimal Impact on this environment, and when It occurred this 
Interaction was not measured. The release of DIRAC or of some similar 
tool to a potentially large user population would have repercussions on 
the system, and it was Important to understand this phenomenon. This 
provided the main motivation for Stanford's support of a limited 
O 



